61 research outputs found

    Vers une meilleure modélisation du langage : la prise en compte des séquences dans les modèles statistiques

    Get PDF
    Colloque avec actes et comité de lecture. nationale.National audienceNous trouvons dans la langue naturelle, plusieurs séquences de mots clés traduisant la structure d'une phrase. Ces séquences sont de longueur variable et permettent d'avoir une élocution naturelle. Pour tenir compte de ces séquences lors de la reconnaissance de la parole, nous les avons considérées comme des unités et nous les avons ajoutées au vocabulaire de base. Par conséquent, les modèles de langage utilisant ce nouveau vocabulaire se fondent sur un historique d'unités où chacune d'entre elles peut être, soit un mot, soit une séquence. Nous présentons dans ce papier une méthode originale d'extraction de séquences de mots linguistiquement viable ; cette méthode se fonde sur le principe de la théorie de l'information. Nous exposons également dans ce papier différents modèles de langage se basant sur ces séquences. l'évaluation a été effectué avec un dictionnaire de 20000 mots et avec un corpus de 43 million de mots. l'utilisation des séquences a amélioré la perplexité d'environ 23% et le taux d'erreur de notre système de reconnaissance vocale MAUD d'environ 20%. || In natural language, several sequences of words are very frequent. Conventional language models do not adequately take into account such sequences, because they underestimate their probabilities. A better approach consists in modeling word sequences as i

    Statistical language modeling based on variable-length sequences

    Get PDF
    Abstract In natural language and especially in spontaneous speech, people often group words in order to constitute phrases which become usual expressions. This is due to phonological (to make the pronunciation easier), or to semantic reasons (to remember more easily a phrase by assigning a meaning to a block of words). Classical language models do not adequately take into account such phrases. A better approach consists in modeling some word sequences as if they were individual dictionary elements. Sequences are considered as additional entries of the vocabulary, on which language models are computed. In this paper, we present a method for automatically retrieving the most relevant phrases from a corpus of written sentences. The originality of our approach resides in the fact that the extracted phrases are obtained from a linguistically tagged corpus. Therefore, the obtained phrases are linguistically viable. To measure the contribution of classes in retrieving phrases, we have implemented the same algorithm without using classes. The class-based method outperformed by 11% the other method. Our approach uses information theoretic criteria which insure a high statistical consistency and make the decision of selecting a potential sequence optimal in accordance with the language perplexity. We propose several variants of language model with and without word sequences. Among them, we present a model in which the trigger pairs are linguistically more significant. We show that the use of sequences decrease the word error rate and improve the normalized perplexity. For instance, the best sequence model improves the perplexity by 16%, and the the accuracy of our dictation system (MAUD) by approximately 14%. Experiments, in terms of perplexity and recognition rate, have been carried out on a vocabulary of 20,000 words extracted from a corpus of 43 million words made up of two years of the French newspaper Le Monde. The acoustic model (HMM) is trained with the Bref80 corpus. Ó 2002 Published b

    Variable-Length Class Sequences Based on a Hierarchical Approach: MCnv

    Get PDF
    International audienceIn this paper, we describe a new language model based on dependent word sequences organized in multi-level hierarchy. We call this model MCnv, where n is the maximum number of words in a sequence and ν\nu is the maximum number of levels. The originality of this model is its capability to take into account dependent variable-length sequences for very large vocabulary. In order to discover the variable-length sequences and to build the hierarchy, we use a set of 233 syntactic classes extracted from the eight French elementary grammatical classes. The MCnv model learns hierarchical word patterns and uses them to reevaluate and filter the n-best utterance hypotheses outputed by our speech recognizer MAUD. The model have been trained on a corpus (LeM) of 43 million of words extracted from ``Le Monde'' a French newspapers and uses a vocabulary of 20000 words. Tests have been conducted on 300 sentences. Results achieved 17% decrease in perplexity compared to an interpolated class trigram model. Rescoring the original n-best hypotheses results also in an improvement of 5% in accuracy

    Beyond the Conventional Statistical Language Models: The Variable-Length Sequences Approach

    Get PDF
    Colloque avec actes et comité de lecture. internationale.International audienceIn natural language, several sequences of words are very frequent.A classical language model, like n-gram, does not adequately takeinto account such sequences, because it underestimates theirprobabilities. A better approach consists in modelling wordsequences as if they were individual dictionary elements.Sequences are considered as additional entries of the wordlexicon, on which language models are computed. In this paper,we present an original method for automatically determining themost important phrases in corpora. This method is based oninformation theoretic criteria, which insure a high statisticalconsistency, and on French grammatical classes which includeadditional type of linguistic dependencies. In addition, theperplexity is used in order to make the decision of selecting apotential sequence more accurate. We propose also severalvariants of language models with and without word sequences.Among them, we present a model in which the trigger pairs aremore significant linguistically. The originality of this model,compared with the commonly used trigger approaches, is the useof word sequences to estimate the trigger pair without limitingitself to single words. Experimental tests, in terms of perplexityand recognition rate, are carried out on a vocabulary of 20000words and a corpus of 43 million words. The use of wordsequences proposed by our algorithm reduces perplexity by morethan 16% compared to those, which are limited to single words.The introduction of these word sequences in our dictationmachine improves the accuracy by approximately 15%

    Variable-Length Sequence Language Model for Large Vocabulary Continuous Dictation Machine

    Get PDF
    Colloque avec actes et comité de lecture.In natural language, some sequences of words are very frequent. A classical language model, like n-gram, does not adequately take into account such sequences, because it underestimates their probabilities. A better approach consists in modeling word sequences as if they were individual dictionary elements. Sequences are considered as additional entries of the word lexicon, on which language models are computed. In this paper, we present two methods for automatically determining frequent phrases in unlabeled corpora of written sentences. These methods are based on information theoretic criteria which insure a high statistical consistency. Our models reach their local optimum since they minimize the perplexity. One procedure is based only on the n-gram language model to extract word sequences. The second one is based on a class n-gram model trained on 233 classes extracted from the eight grammatical classes of French. Experimental tests, in terms of perplexity and recognition rate, are carried out on a vocabulary of 20000 words and a corpus of 43 million words extracted from the ?Le Monde? newspaper. Our models reduce perplexity by more than 20% compared with n-gram (nR3) and multigram models. In terms of recognition rate, our models outperform n-gram and multigram models

    MAUD : Un prototype de machine à dicter vocale

    Get PDF
    Contribution à un ouvrage.La saisie vocale de textes et de données est un important champ d'application des systèmes de reconnaissance automatique de la parole, ainsi que l'un des bancs de test favoris de ces systèmes. l'équipe RFIA/Syco du CRIN/CNRS-INRIA Lorraine développe actuellement une nouvelle version de son système de dictée vocale indépendant du locuteur pour grands vocabulaires, MAUD (Machine AUtomatique à Dicter), fondé essentiellement sur une approche stochastique. Le système MAUD se caractérise par l'utilisation de modèles et algorithmes originaux, notamment les modèles acoustiques de Markov du second ordre et l'algorithme de reconnaissance lexicale à optimalité locale. Des contraintes syntaxiques de plus en plus fortes sont appliquées pour pallier les insuffisances du modèle statistique. Dans ce but, nous utilisons une grammaire d'unification permettant de prendre en compte les phénomènes d'accord en genre et en nombre. Nous présentons dans la suite des résultats préliminaires sur le corpus de développement du projet AUPELF-UREF B1

    Automatic Online Evaluation of Intelligent Assistants

    Get PDF
    ABSTRACT Voice-activated intelligent assistants, such as Siri, Google Now, and Cortana, are prevalent on mobile devices. However, it is challenging to evaluate them due to the varied and evolving number of tasks supported, e.g., voice command, web search, and chat. Since each task may have its own procedure and a unique form of correct answers, it is expensive to evaluate each task individually. This paper is the first attempt to solve this challenge. We develop consistent and automatic approaches that can evaluate different tasks in voice-activated intelligent assistants. We use implicit feedback from users to predict whether users are satisfied with the intelligent assistant as well as its components, i.e., speech recognition and intent classification. Using this approach, we can potentially evaluate and compare different tasks within and across intelligent assistants according to the predicted user satisfaction rates. Our approach is characterized by an automatic scheme of categorizing user-system interaction into task-independent dialog actions, e.g., the user is commanding, selecting, or confirming an action. We use the action sequence in a session to predict user satisfaction and the quality of speech recognition and intent classification. We also incorporate other features to further improve our approach, including features derived from previous work on web search satisfaction prediction, and those utilizing acoustic characteristics of voice requests. We evaluate our approach using data collected from a user study. Results show our approach can accurately identify satisfactory and unsatisfactory sessions
    corecore